1 Data

Load the student scores for the test - here we load the ETH Zurich test data, downloaded from https://pontifex.ethz.ch/s21t5/rate/

Show data summary
skim(test_scores)
Data summary
Name test_scores
Number of rows 9671
Number of columns 44
_______________________
Column type frequency:
character 3
numeric 41
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
test_version 0 1 3 4 0 2 0
year 0 1 4 4 0 5 0
class 0 1 13 13 0 5 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
A1_B1 0 1.00 0.97 0.21 0 1 1 1 2 ▁▁▇▁▁
A2_B0 6238 0.35 0.43 0.62 0 0 0 1 2 ▇▁▃▁▁
A3_B2 0 1.00 0.70 0.51 0 0 1 1 2 ▃▁▇▁▁
A4_B3 0 1.00 0.66 0.51 0 0 1 1 2 ▅▁▇▁▁
A5_B4 0 1.00 1.03 0.72 0 1 1 2 2 ▃▁▇▁▅
A6_B5 0 1.00 1.06 0.51 0 1 1 1 2 ▁▁▇▁▂
A7_B6 0 1.00 0.72 0.47 0 0 1 1 2 ▃▁▇▁▁
A8_B7 0 1.00 0.80 0.65 0 0 1 1 2 ▅▁▇▁▂
A9_B8 0 1.00 1.08 0.74 0 1 1 2 2 ▅▁▇▁▆
A10_B9 0 1.00 1.01 0.68 0 1 1 1 2 ▃▁▇▁▃
A11_B10 0 1.00 0.93 0.64 0 1 1 1 2 ▃▁▇▁▂
A12_B0 6238 0.35 0.90 0.54 0 1 1 1 2 ▂▁▇▁▁
A13_B0 6238 0.35 0.60 0.69 0 0 0 1 2 ▇▁▆▁▂
A14_B12 0 1.00 0.81 0.64 0 0 1 1 2 ▅▁▇▁▂
A15_B13 0 1.00 0.74 0.69 0 0 1 1 2 ▇▁▇▁▂
A16_B14 0 1.00 0.61 0.80 0 0 0 1 2 ▇▁▂▁▃
A17_B15 0 1.00 0.73 0.58 0 0 1 1 2 ▅▁▇▁▁
A18_B16 0 1.00 0.78 0.80 0 0 1 1 2 ▇▁▆▁▅
A19_B0 6238 0.35 0.95 0.81 0 0 1 2 2 ▇▁▇▁▇
A20_B17 0 1.00 0.83 0.50 0 1 1 1 2 ▂▁▇▁▁
A21_B18 0 1.00 0.92 0.69 0 0 1 1 2 ▅▁▇▁▃
A22_B19 0 1.00 1.00 0.70 0 1 1 1 2 ▃▁▇▁▃
A23_B20 0 1.00 0.90 0.75 0 0 1 1 2 ▆▁▇▁▅
A24_B21 0 1.00 0.89 0.64 0 0 1 1 2 ▃▁▇▁▂
A25_B0 6238 0.35 0.72 0.53 0 0 1 1 2 ▅▁▇▁▁
A26_B22 0 1.00 1.07 0.44 0 1 1 1 2 ▁▁▇▁▂
A27_B23 0 1.00 1.02 0.77 0 0 1 2 2 ▆▁▇▁▆
A28_B24 0 1.00 1.08 0.75 0 1 1 2 2 ▅▁▇▁▆
A29_B25 0 1.00 0.88 0.70 0 0 1 1 2 ▅▁▇▁▃
A30_B0 6238 0.35 0.90 0.50 0 1 1 1 2 ▂▁▇▁▁
A31_B27 0 1.00 0.63 0.69 0 0 1 1 2 ▇▁▆▁▂
A32_B28 0 1.00 0.70 0.84 0 0 0 1 2 ▇▁▃▁▃
A33_B29 0 1.00 0.83 0.46 0 1 1 1 2 ▂▁▇▁▁
A34_B0 6238 0.35 1.14 0.62 0 1 1 2 2 ▂▁▇▁▃
A35_B0 6238 0.35 1.31 0.76 0 1 1 2 2 ▃▁▅▁▇
A36_B0 6238 0.35 1.14 0.87 0 0 1 2 2 ▆▁▃▁▇
A0_B11 3433 0.65 0.70 0.57 0 0 1 1 2 ▅▁▇▁▁
A0_B26 3433 0.65 0.80 0.65 0 0 1 1 2 ▅▁▇▁▂
A0_B30 3433 0.65 1.04 0.63 0 1 1 1 2 ▂▁▇▁▃
A0_B31 3433 0.65 0.96 0.79 0 0 1 2 2 ▇▁▇▁▆
A0_B32 3433 0.65 1.37 0.81 0 1 2 2 2 ▃▁▃▁▇

The scores are:

  • 0 if the student gave an incorrect response
  • 1 if the student gave a correct response
  • 2 if the student chose the “I don’t know” answer option
  • NA if there was no response recorded

For this analysis, we replace the “2 = I don’t know” scores with 0, reflecting a non-correct answer.

test_scores <- test_scores %>% 
  mutate(across(starts_with("A"), ~ ifelse(. == 2, 0, .)))

1.1 Data summary

The number of responses from each cohort:

test_scores %>% 
  group_by(year) %>% 
  tally() %>% 
  gt() %>% 
  data_color(
    columns = c("n"),
    colors = scales::col_numeric(palette = c("Blues"), domain = NULL)
  )
year n
2017 1682
2018 1751
2019 1999
2020 2191
2021 2048

There is the same (roughly normal) distribution of raw scores each year:

total_scores <- test_scores %>% mutate(total = rowSums(across(where(is.numeric)), na.rm = TRUE))

total_scores %>%
  ggplot(aes(x = total)) +
  geom_histogram(binwidth = 1) +
  facet_wrap(vars(year)) +
  theme_minimal()

Mean and standard deviation for each item (note this table is very wide - see the scroll bar at the bottom!):

test_scores %>% 
  select(-class, -test_version) %>% 
  group_by(year) %>% 
  skim_without_charts() %>% 
  select(-contains("character."), -contains("numeric.p"), -skim_type) %>% 
  rename(complete = complete_rate) %>% 
  # make the table wider, i.e. with separate columns for each year's results, with the year at the start of each column name
  pivot_wider(names_from = year, values_from = -c(skim_variable, year), names_glue = "{year}__{.value}") %>% 
  # put the columns in order by year
  select(sort(names(.))) %>% 
  select(skim_variable, everything()) %>% 
  # use GT to make the table look nice
  gt(rowname_col = "skim_variable") %>% 
  # group the columns from each year
  tab_spanner_delim(delim = "__") %>%
  fmt_number(columns = contains("numeric"), decimals = 2) %>%
  fmt_percent(columns = contains("complete"), decimals = 0) %>% 
  # change all the numeric.mean and numeric.sd column names to Mean and SD
  cols_label(
    .list = test_scores %>% select(year) %>% distinct() %>% transmute(col = paste0(year, "__numeric.mean"), label = "Mean") %>% deframe()
  ) %>% 
  cols_label(
    .list = test_scores %>% select(year) %>% distinct() %>% transmute(col = paste0(year, "__numeric.sd"), label = "SD") %>% deframe()
  ) %>%
  data_color(
    columns = contains("numeric.mean"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  )
2017 2018 2019 2020 2021
complete n_missing Mean SD complete n_missing Mean SD complete n_missing Mean SD complete n_missing Mean SD complete n_missing Mean SD
A1_B1 100% 0 0.95 0.22 100% 0 0.95 0.21 100% 0 0.96 0.21 100% 0 0.96 0.20 100% 0 0.95 0.21
A2_B0 100% 0 0.31 0.46 100% 0 0.28 0.45 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
A3_B2 100% 0 0.62 0.48 100% 0 0.64 0.48 100% 0 0.67 0.47 100% 0 0.66 0.48 100% 0 0.67 0.47
A4_B3 100% 0 0.64 0.48 100% 0 0.62 0.48 100% 0 0.64 0.48 100% 0 0.62 0.48 100% 0 0.60 0.49
A5_B4 100% 0 0.47 0.50 100% 0 0.49 0.50 100% 0 0.49 0.50 100% 0 0.49 0.50 100% 0 0.49 0.50
A6_B5 100% 0 0.71 0.46 100% 0 0.73 0.44 100% 0 0.74 0.44 100% 0 0.76 0.42 100% 0 0.74 0.44
A7_B6 100% 0 0.71 0.46 100% 0 0.70 0.46 100% 0 0.69 0.46 100% 0 0.70 0.46 100% 0 0.68 0.47
A8_B7 100% 0 0.47 0.50 100% 0 0.50 0.50 100% 0 0.52 0.50 100% 0 0.60 0.49 100% 0 0.58 0.49
A9_B8 100% 0 0.46 0.50 100% 0 0.46 0.50 100% 0 0.46 0.50 100% 0 0.43 0.50 100% 0 0.42 0.49
A10_B9 100% 0 0.54 0.50 100% 0 0.54 0.50 100% 0 0.55 0.50 100% 0 0.55 0.50 100% 0 0.53 0.50
A11_B10 100% 0 0.57 0.49 100% 0 0.58 0.49 100% 0 0.58 0.49 100% 0 0.59 0.49 100% 0 0.57 0.50
A12_B0 100% 0 0.68 0.47 100% 0 0.71 0.45 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
A13_B0 100% 0 0.37 0.48 100% 0 0.36 0.48 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
A14_B12 100% 0 0.56 0.50 100% 0 0.57 0.49 100% 0 0.59 0.49 100% 0 0.54 0.50 100% 0 0.54 0.50
A15_B13 100% 0 0.46 0.50 100% 0 0.46 0.50 100% 0 0.47 0.50 100% 0 0.46 0.50 100% 0 0.45 0.50
A16_B14 100% 0 0.19 0.39 100% 0 0.20 0.40 100% 0 0.20 0.40 100% 0 0.20 0.40 100% 0 0.20 0.40
A17_B15 100% 0 0.59 0.49 100% 0 0.60 0.49 100% 0 0.60 0.49 100% 0 0.59 0.49 100% 0 0.57 0.49
A18_B16 100% 0 0.38 0.49 100% 0 0.36 0.48 100% 0 0.28 0.45 100% 0 0.27 0.45 100% 0 0.28 0.45
A19_B0 100% 0 0.35 0.48 100% 0 0.34 0.47 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
A20_B17 100% 0 0.72 0.45 100% 0 0.72 0.45 100% 0 0.74 0.44 100% 0 0.71 0.46 100% 0 0.71 0.45
A21_B18 100% 0 0.51 0.50 100% 0 0.52 0.50 100% 0 0.54 0.50 100% 0 0.52 0.50 100% 0 0.51 0.50
A22_B19 100% 0 0.52 0.50 100% 0 0.52 0.50 100% 0 0.52 0.50 100% 0 0.52 0.50 100% 0 0.50 0.50
A23_B20 100% 0 0.41 0.49 100% 0 0.43 0.50 100% 0 0.42 0.49 100% 0 0.42 0.49 100% 0 0.42 0.49
A24_B21 100% 0 0.59 0.49 100% 0 0.58 0.49 100% 0 0.60 0.49 100% 0 0.56 0.50 100% 0 0.57 0.50
A25_B0 100% 0 0.54 0.50 100% 0 0.73 0.44 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
A26_B22 100% 0 0.80 0.40 100% 0 0.80 0.40 100% 0 0.80 0.40 100% 0 0.81 0.39 100% 0 0.80 0.40
A27_B23 100% 0 0.39 0.49 100% 0 0.40 0.49 100% 0 0.42 0.49 100% 0 0.40 0.49 100% 0 0.41 0.49
A28_B24 100% 0 0.42 0.49 100% 0 0.46 0.50 100% 0 0.45 0.50 100% 0 0.42 0.49 100% 0 0.41 0.49
A29_B25 100% 0 0.48 0.50 100% 0 0.52 0.50 100% 0 0.51 0.50 100% 0 0.49 0.50 100% 0 0.50 0.50
A30_B0 100% 0 0.76 0.43 100% 0 0.73 0.44 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
A31_B27 100% 0 0.38 0.48 100% 0 0.37 0.48 100% 0 0.39 0.49 100% 0 0.40 0.49 100% 0 0.40 0.49
A32_B28 100% 0 0.23 0.42 100% 0 0.20 0.40 100% 0 0.20 0.40 100% 0 0.21 0.41 100% 0 0.20 0.40
A33_B29 100% 0 0.78 0.42 100% 0 0.76 0.43 100% 0 0.77 0.42 100% 0 0.74 0.44 100% 0 0.75 0.43
A34_B0 100% 0 0.58 0.49 100% 0 0.61 0.49 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
A35_B0 100% 0 0.32 0.47 100% 0 0.33 0.47 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
A36_B0 100% 0 0.22 0.41 100% 0 0.23 0.42 0% 1999 NaN NA 0% 2191 NaN NA 0% 2048 NaN NA
A0_B11 0% 1682 NaN NA 0% 1751 NaN NA 100% 0 0.57 0.50 100% 0 0.59 0.49 100% 0 0.58 0.49
A0_B26 0% 1682 NaN NA 0% 1751 NaN NA 100% 0 0.54 0.50 100% 0 0.53 0.50 100% 0 0.52 0.50
A0_B30 0% 1682 NaN NA 0% 1751 NaN NA 100% 0 0.61 0.49 100% 0 0.61 0.49 100% 0 0.60 0.49
A0_B31 0% 1682 NaN NA 0% 1751 NaN NA 100% 0 0.35 0.48 100% 0 0.33 0.47 100% 0 0.45 0.50
A0_B32 0% 1682 NaN NA 0% 1751 NaN NA 100% 0 0.20 0.40 100% 0 0.21 0.41 100% 0 0.21 0.41

2 Testing assumptions

Before applying IRT, we should check that the data satisfies the assumptions needed by the model. In particular, to use a 1-dimensional IRT model, we should have some evidence of unidimensionality in the test scores.

Since the combined data on the two versions of the test have large portions of “missing” data (e.g. no responses to new questions from students who completed the old test), it is not possible to carry out the factor analysis as in the analyse-test script, since the factor analysis routine does not work with missing data.

Instead, in the next section we proceed directly to fitting the IRT model, and using the \(Q_3\) check for local independence. In the final section, we also run a factor analysis for the data from the new test only.

3 Fitting 2 parameter logistic MIRT model

We use the mirt package to fit an item response theory model.

For this analysis, we use the 2 parameter logistic (2PL) model.

Show model fitting output
# save just the matrix of scores
item_scores <- test_scores %>% 
  select(matches("^A\\d"))

fit_2pl <- mirt(
  data = item_scores, # just the columns with question scores
  #removeEmptyRows = TRUE, 
  model = 1,          # number of factors to extract
  itemtype = "2PL",   # 2 parameter logistic model
  SE = TRUE           # estimate standard errors
  )
## 
Iteration: 1, Log-Lik: -186638.319, Max-Change: 4.61593
Iteration: 2, Log-Lik: -173222.852, Max-Change: 3.34257
Iteration: 3, Log-Lik: -172025.296, Max-Change: 0.75461
Iteration: 4, Log-Lik: -171559.893, Max-Change: 0.30299
Iteration: 5, Log-Lik: -171310.677, Max-Change: 0.29565
Iteration: 6, Log-Lik: -171163.155, Max-Change: 0.15660
Iteration: 7, Log-Lik: -171055.541, Max-Change: 0.15756
Iteration: 8, Log-Lik: -170980.492, Max-Change: 0.10202
Iteration: 9, Log-Lik: -170924.986, Max-Change: 0.13257
Iteration: 10, Log-Lik: -170881.349, Max-Change: 0.08812
Iteration: 11, Log-Lik: -170846.140, Max-Change: 0.07497
Iteration: 12, Log-Lik: -170823.782, Max-Change: 0.04927
Iteration: 13, Log-Lik: -170807.153, Max-Change: 0.06731
Iteration: 14, Log-Lik: -170793.659, Max-Change: 0.03988
Iteration: 15, Log-Lik: -170783.377, Max-Change: 0.04419
Iteration: 16, Log-Lik: -170775.383, Max-Change: 0.02498
Iteration: 17, Log-Lik: -170768.350, Max-Change: 0.03335
Iteration: 18, Log-Lik: -170762.835, Max-Change: 0.01562
Iteration: 19, Log-Lik: -170757.759, Max-Change: 0.02149
Iteration: 20, Log-Lik: -170753.836, Max-Change: 0.01406
Iteration: 21, Log-Lik: -170750.152, Max-Change: 0.01118
Iteration: 22, Log-Lik: -170744.133, Max-Change: 0.01296
Iteration: 23, Log-Lik: -170741.889, Max-Change: 0.01003
Iteration: 24, Log-Lik: -170740.015, Max-Change: 0.00919
Iteration: 25, Log-Lik: -170732.968, Max-Change: 0.00538
Iteration: 26, Log-Lik: -170732.507, Max-Change: 0.00457
Iteration: 27, Log-Lik: -170732.168, Max-Change: 0.00410
Iteration: 28, Log-Lik: -170730.815, Max-Change: 0.00252
Iteration: 29, Log-Lik: -170730.746, Max-Change: 0.00278
Iteration: 30, Log-Lik: -170730.686, Max-Change: 0.00187
Iteration: 31, Log-Lik: -170730.559, Max-Change: 0.00125
Iteration: 32, Log-Lik: -170730.538, Max-Change: 0.00104
Iteration: 33, Log-Lik: -170730.519, Max-Change: 0.00099
Iteration: 34, Log-Lik: -170730.435, Max-Change: 0.00068
Iteration: 35, Log-Lik: -170730.427, Max-Change: 0.00035
Iteration: 36, Log-Lik: -170730.424, Max-Change: 0.00018
Iteration: 37, Log-Lik: -170730.419, Max-Change: 0.00019
Iteration: 38, Log-Lik: -170730.417, Max-Change: 0.00018
Iteration: 39, Log-Lik: -170730.416, Max-Change: 0.00016
Iteration: 40, Log-Lik: -170730.408, Max-Change: 0.00013
Iteration: 41, Log-Lik: -170730.408, Max-Change: 0.00013
Iteration: 42, Log-Lik: -170730.407, Max-Change: 0.00013
Iteration: 43, Log-Lik: -170730.403, Max-Change: 0.00056
Iteration: 44, Log-Lik: -170730.401, Max-Change: 0.00046
Iteration: 45, Log-Lik: -170730.399, Max-Change: 0.00046
Iteration: 46, Log-Lik: -170730.398, Max-Change: 0.00029
Iteration: 47, Log-Lik: -170730.397, Max-Change: 0.00011
Iteration: 48, Log-Lik: -170730.397, Max-Change: 0.00005
## 
## Calculating information matrix...

3.1 Local independence

We compute Yen’s \(Q_3\) (1984) to check for any dependence between items after controlling for \(\theta\). This gives a score for each pair of items, with scores above 0.2 regarded as problematic (see DeMars, p. 48).

residuals_2pl  %>% as.matrix() %>% 
  corrplot::corrplot(type = "upper")

This shows that most item pairs are independent, with only a couple of pairs showing cause for concern:

residuals_2pl %>%
  rownames_to_column(var = "item1") %>%
  as_tibble() %>% 
  pivot_longer(cols = starts_with("A"), names_to = "item2", values_to = "Q3_score") %>% 
  filter(abs(Q3_score) > 0.2) %>% 
  filter(parse_number(item1) < parse_number(item2)) %>%
  gt()
item1 item2 Q3_score
A18_B16 A19_B0 0.6739582
A34_B0 A35_B0 0.2105944

In fact, both of these pairs highlight questions that were removed from the test:

  • A18 and A19 are on the product and quotient rules, and only A18 was retained on the new test,

  • A34 and A35 are both about vectors; only A34 was retained (in modified form, as B30)

Given that this violation of the local independence assumption is very mild, we proceed using this model.

3.2 Model parameters

We then compute factor score estimates and augment the existing data frame with these estimates, to keep everything in one place. To do the estimation, we use the fscores() function from the mirt package which takes in a computed model object and computes factor score estimates according to the method specified. We will use the EAP method for factor score estimation, which is the “expected a-posteriori” method, the default.

test_scores <- test_scores %>%
  mutate(F1 = fscores(fit_2pl, method = "EAP"))

We can also calculate the model coefficient estimates using a generic function coef() which is used to extract model coefficients from objects returned by modeling functions. We will set the IRTpars argument to TRUE, which means slope intercept parameters will be converted into traditional IRT parameters.

coefs_2pl <- coef(fit_2pl, IRTpars = TRUE)

The resulting object coefs is a list, with one element for each question, and an additional GroupPars element that we won’t be using. For each question, the object records several values:

  • a is discrimination
  • b is difficulty
  • endpoints of the 95% confidence intervals are also shown

To make this output a little more user friendly, we use the tidy_mirt_coefs function that we have provided, to produce a single data frame with a row for each question.

source("fn_tidy_mirt_coefs.R")

tidy_2pl <- tidy_mirt_coefs(coefs_2pl) %>% 
  filter(par %in% c("a", "b")) %>%  
  left_join(test_versions %>% select(-item_num), by = c("Question" = "label"))

tidy_2pl_wide <- tidy_2pl %>% 
  pivot_wider(names_from = "par", values_from = c(est, CI_2.5, CI_97.5), names_glue = "{par}_{.value}")

Here is a nicely formatted table of the result:

tidy_2pl %>%
  select(-pre,-post,-notes) %>% 
  group_by(outcome) %>% 
  mutate(ci = str_glue("[{round(CI_2.5, 2)}, {round(CI_97.5, 2)}]")) %>% 
  select(-starts_with("CI_")) %>% 
  pivot_wider(names_from = "par", values_from = c(est, ci), names_glue = "{par}_{.value}") %>% 
  gt() %>% 
  fmt_number(columns = contains("_est"), decimals = 2) %>%
  data_color(
    columns = contains("a_est"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  data_color(
    columns = contains("b_est"),
    colors = scales::col_numeric(palette = c("Blues"), domain = NULL)
  ) %>%
  tab_spanner(label = "Discrimination", columns = contains("a_")) %>%
  tab_spanner(label = "Difficulty", columns = contains("b_")) %>%
  cols_label(
    a_est = "Est.",
    b_est = "Est.",
    a_ci = "95% CI",
    b_ci = "95% CI"
  )
Question description Discrimination Difficulty
Est. 95% CI Est. 95% CI
unchanged
A1_B1 linear function 1.26 [1.14, 1.38] −2.92 [-3.13, -2.71]
A3_B2 arithmetic rules 1.29 [1.23, 1.36] −0.64 [-0.69, -0.6]
A4_B3 Easy ineq. 1.15 [1.09, 1.21] −0.57 [-0.62, -0.52]
A5_B4 logs 1.43 [1.35, 1.5] 0.05 [0.01, 0.09]
A6_B5 logs 1.51 [1.43, 1.6] −0.97 [-1.02, -0.92]
A7_B6 graph translation 1.37 [1.3, 1.44] −0.82 [-0.87, -0.77]
A8_B7 sine pi/3 1.18 [1.11, 1.24] −0.18 [-0.22, -0.13]
A9_B8 trig.ineq. 1.98 [1.89, 2.08] 0.17 [0.14, 0.2]
A10_B9 trig.identity 1.74 [1.66, 1.83] −0.16 [-0.19, -0.12]
A11_B10 period 1.61 [1.53, 1.69] −0.30 [-0.34, -0.26]
A14_B12 limit 1.47 [1.4, 1.55] −0.23 [-0.27, -0.19]
A15_B13 series 0.83 [0.78, 0.88] 0.21 [0.16, 0.27]
A16_B14 diff.quotient 1.41 [1.32, 1.49] 1.34 [1.28, 1.41]
A17_B15 graph f' 1.22 [1.15, 1.28] −0.39 [-0.43, -0.34]
A18_B16 product rule 1.08 [1.02, 1.15] 0.91 [0.85, 0.97]
A20_B17 (exp)' 2.18 [2.06, 2.29] −0.75 [-0.78, -0.71]
A21_B18 (ln(sin))' 2.10 [2, 2.2] −0.07 [-0.1, -0.03]
A22_B19 hyp.functions 1.99 [1.9, 2.09] −0.05 [-0.08, -0.01]
A23_B20 slope tangent 1.94 [1.85, 2.03] 0.26 [0.23, 0.3]
A24_B21 IVT 1.41 [1.34, 1.49] −0.31 [-0.35, -0.27]
A26_B22 int(poly) 2.38 [2.25, 2.51] −1.06 [-1.1, -1.02]
A27_B23 int(exp) 1.89 [1.8, 1.99] 0.32 [0.29, 0.36]
A28_B24 Int = 0 1.85 [1.76, 1.94] 0.23 [0.2, 0.27]
A29_B25 int even funct 0.83 [0.78, 0.88] 0.00 [-0.05, 0.06]
A31_B27 int(abs) 1.29 [1.23, 1.36] 0.45 [0.4, 0.49]
A32_B28 FtoC algebra 1.52 [1.44, 1.61] 1.22 [1.16, 1.28]
A33_B29 difference vectors 0.99 [0.93, 1.06] −1.37 [-1.46, -1.29]
removed
A2_B0 3D 0.55 [0.47, 0.64] 1.69 [1.43, 1.96]
A12_B0 rational exponents 0.88 [0.78, 0.97] −1.11 [-1.24, -0.98]
A13_B0 ellipsoid 0.71 [0.62, 0.79] 0.86 [0.73, 1]
A19_B0 quotient rule 1.27 [1.16, 1.38] 0.65 [0.58, 0.73]
A25_B0 velocity 0.53 [0.45, 0.61] −1.14 [-1.35, -0.94]
A30_B0 FtoC graph 1.25 [1.13, 1.37] −1.10 [-1.2, -1]
A34_B0 normal vector 0.89 [0.8, 0.99] −0.52 [-0.62, -0.43]
A35_B0 intersect planes 1.26 [1.15, 1.37] 0.74 [0.66, 0.82]
A36_B0 parallel planes 1.23 [1.11, 1.35] 1.30 [1.19, 1.41]
added
A0_B11 rational exponents 0.77 [0.7, 0.84] −0.48 [-0.56, -0.4]
A0_B26 FtoC graph 1.08 [1.01, 1.16] −0.15 [-0.2, -0.09]
A0_B30 normal vector 0.81 [0.75, 0.88] −0.59 [-0.67, -0.51]
A0_B31 vector product 1.01 [0.94, 1.09] 0.62 [0.55, 0.69]
A0_B32 scalar product 1.32 [1.22, 1.41] 1.32 [1.24, 1.41]

These values are also saved to the file output/eth_post_2pl-results.csv.

tidy_2pl_wide %>% 
  write_csv("output/eth_post_2pl-results.csv")

This shows the difficulty and discrimination of each of the questions, highlighting those that were added or removed:

tidy_2pl_wide %>% 
  mutate(qnum = parse_number(Question)) %>% 
  ggplot(aes(
    x = a_est,
    y = b_est,
    label = case_when(
      outcome == "unchanged" ~ "",
      outcome == "removed" ~ pre,
      outcome == "added" ~ post
    ),
    colour = outcome
  )) +
  geom_errorbar(aes(ymin = b_CI_2.5, ymax = b_CI_97.5), width = 0, alpha = 0.5) +
  geom_errorbar(aes(xmin = a_CI_2.5, xmax = a_CI_97.5), width = 0, alpha = 0.5) +
  geom_text_repel() +
  geom_point() +
  theme_minimal() +
  labs(x = "Discrimination",
       y = "Difficulty")

3.3 Comparing years and classes

Do students from different programmes of study have different distributions of ability?

Compare the distribution of abilities in the year groups:

ggplot(test_scores, aes(F1, y = year, fill = as.factor(year), colour = as.factor(year))) +
  geom_density_ridges(alpha=0.5) + 
  scale_x_continuous(limits = c(-3.5,3.5)) +
  labs(title = "Density plot", 
       subtitle = "Ability grouped by year of taking the test", 
       x = "Ability", y = "Density",
       fill = "Year", colour = "Year") +
  theme_minimal()
## Picking joint bandwidth of 0.189

3.4 Information curves

theta <- seq(-6, 6, by=0.05)

info_matrix <- testinfo(fit_2pl, theta, individual = TRUE)
colnames(info_matrix) <- test_versions %>% pull(label)
item_info_data <- info_matrix %>% 
  as_tibble() %>% 
  bind_cols(theta = theta) %>% 
  pivot_longer(cols = -theta, names_to = "item", values_to = "info_y") %>% 
  mutate(item = fct_reorder(item, parse_number(item)))

plot(fit_2pl, type = "infoSE", main = "Test information")

3.4.1 Item information curves

The information curves for each question help to highlight those questions that are most/least informative:

item_info_data %>% 
  ggplot(aes(x = theta, y = info_y, colour = item)) +
  geom_line() +
  scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
  facet_wrap(vars(item)) +
  labs(y = "Information") +
  theme_minimal()

3.5 Total information

Using mirt’s areainfo function, we can find the total area under the information curves.

info_irt <- areainfo(fit_2pl, c(-4,4))
info_irt %>% gt()
LowerBound UpperBound Info TotalInfo Proportion nitems
-4 4 52.88075 54.46439 0.9709234 41

This shows that the total information in all items is 54.4643857.

3.5.1 Information by item

tidy_info <- test_versions %>%
  mutate(item_num = row_number()) %>% 
  mutate(TotalInfo = purrr::map_dbl(
    item_num,
    ~ areainfo(fit_2pl,
               c(-4, 4),
               which.items = .x) %>% pull(TotalInfo)
  ))

tidy_info %>%
  select(-item_num) %>% 
  arrange(-TotalInfo) %>% 
  #group_by(outcome) %>% 
  gt() %>% 
  sub_missing(columns = one_of("notes", "pre", "post"), missing_text = "") %>% 
  fmt_number(columns = contains("a_"), decimals = 2) %>%
  fmt_number(columns = contains("b_"), decimals = 2) %>%
  data_color(
    columns = contains("info"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  data_color(
    columns = contains("outcome"),
    colors = scales::col_factor(palette = c("viridis"), domain = NULL)
  ) %>%
  cols_label(
    TotalInfo = "Information"
  )
pre post description notes label outcome Information
A26 B22 int(poly) A26_B22 unchanged 2.3802120
A20 B17 (exp)' A20_B17 unchanged 2.1756893
A21 B18 (ln(sin))' A21_B18 unchanged 2.0980069
A22 B19 hyp.functions A22_B19 unchanged 1.9933525
A9 B8 trig.ineq. A9_B8 unchanged 1.9831140
A23 B20 slope tangent A23_B20 unchanged 1.9408823
A27 B23 int(exp) A27_B23 unchanged 1.8936276
A28 B24 Int = 0 A28_B24 unchanged 1.8527290
A10 B9 trig.identity A10_B9 unchanged 1.7433688
A11 B10 period A11_B10 unchanged 1.6077264
A33 B29 difference vectors A33_B29 unchanged 1.5215297
A6 B5 logs A6_B5 unchanged 1.5147947
A14 B12 limit A14_B12 unchanged 1.4742547
A5 B4 logs A5_B4 unchanged 1.4257166
A24 B21 IVT A24_B21 unchanged 1.4143219
A16 B14 diff.quotient A16_B14 unchanged 1.4056184
A7 B6 graph translation A7_B6 unchanged 1.3700318
B30 normal vector updated version of A34 A0_B30 added 1.3176397
A3 B2 arithmetic rules A3_B2 unchanged 1.2944966
A32 B28 FtoC algebra A32_B28 unchanged 1.2929143
A19 quotient rule A19_B0 removed 1.2709396
A1 B1 linear function A1_B1 unchanged 1.2607121
A30 FtoC graph adjusted to B26 A30_B0 removed 1.2590765
A31 B27 int(abs) A31_B27 unchanged 1.2511116
A34 normal vector adjusted to B30 A34_B0 removed 1.2255439
A17 B15 graph f' A17_B15 unchanged 1.2156068
A8 B7 sine pi/3 A8_B7 unchanged 1.1759131
A4 B3 Easy ineq. A4_B3 unchanged 1.1498403
B31 vector product A0_B31 added 1.0840039
A18 B16 product rule A18_B16 unchanged 1.0825203
B26 FtoC graph updated version of A30 A0_B26 added 1.0127943
A35 intersect planes A35_B0 removed 0.9911898
A36 parallel planes A36_B0 removed 0.8938061
A12 rational exponents adjusted to B11 A12_B0 removed 0.8758781
A29 B25 int even funct A29_B25 unchanged 0.8301225
A15 B13 series A15_B13 unchanged 0.8287906
B32 scalar product A0_B32 added 0.8117541
B11 rational exponents updated version of A12 A0_B11 added 0.7690007
A13 ellipsoid A13_B0 removed 0.7074127
A2 3D A2_B0 removed 0.5473019
A25 velocity A25_B0 removed 0.5210397

Restricting instead to the range \(-2\leq\theta\leq2\):

tidy_info <- test_versions %>%
  mutate(item_num = row_number()) %>% 
  mutate(TotalInfo = purrr::map_dbl(
    item_num,
    ~ areainfo(fit_2pl,
               c(-2, 2),
               which.items = .x) %>% pull(Info)
  ))

tidy_info %>%
  select(-item_num) %>% 
  arrange(-TotalInfo) %>% 
  #group_by(outcome) %>% 
  gt() %>% 
  sub_missing(columns = one_of("notes", "pre", "post"), missing_text = "") %>% 
  fmt_number(columns = contains("a_"), decimals = 2) %>%
  fmt_number(columns = contains("b_"), decimals = 2) %>%
  data_color(
    columns = contains("info"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  data_color(
    columns = contains("outcome"),
    colors = scales::col_factor(palette = c("viridis"), domain = NULL)
  ) %>%
  cols_label(
    TotalInfo = "Information"
  )
pre post description notes label outcome Information
A26 B22 int(poly) A26_B22 unchanged 2.1496515
A20 B17 (exp)' A20_B17 unchanged 2.0368241
A21 B18 (ln(sin))' A21_B18 unchanged 2.0351851
A22 B19 hyp.functions A22_B19 unchanged 1.9204031
A9 B8 trig.ineq. A9_B8 unchanged 1.9053008
A23 B20 slope tangent A23_B20 unchanged 1.8528009
A27 B23 int(exp) A27_B23 unchanged 1.7945879
A28 B24 Int = 0 A28_B24 unchanged 1.7559052
A10 B9 trig.identity A10_B9 unchanged 1.6363424
A11 B10 period A11_B10 unchanged 1.4708151
A14 B12 limit A14_B12 unchanged 1.3203486
A5 B4 logs A5_B4 unchanged 1.2696657
A24 B21 IVT A24_B21 unchanged 1.2437250
A6 B5 logs A6_B5 unchanged 1.2360528
A33 B29 difference vectors A33_B29 unchanged 1.1534582
A7 B6 graph translation A7_B6 unchanged 1.1156355
A32 B28 FtoC algebra A32_B28 unchanged 1.0872317
A3 B2 arithmetic rules A3_B2 unchanged 1.0630819
A19 quotient rule A19_B0 removed 1.0346021
A30 FtoC graph adjusted to B26 A30_B0 removed 1.0059374
A17 B15 graph f' A17_B15 unchanged 1.0022925
A16 B14 diff.quotient A16_B14 unchanged 0.9940931
A8 B7 sine pi/3 A8_B7 unchanged 0.9682115
A31 B27 int(abs) A31_B27 unchanged 0.9193184
B30 normal vector updated version of A34 A0_B30 added 0.9180818
A4 B3 Easy ineq. A4_B3 unchanged 0.9070485
B31 vector product A0_B31 added 0.8594470
A34 normal vector adjusted to B30 A34_B0 removed 0.8404270
A18 B16 product rule A18_B16 unchanged 0.7834848
B26 FtoC graph updated version of A30 A0_B26 added 0.7456596
A36 parallel planes A36_B0 removed 0.6208678
A35 intersect planes A35_B0 removed 0.6115641
A29 B25 int even funct A29_B25 unchanged 0.5653949
A15 B13 series A15_B13 unchanged 0.5615278
A12 rational exponents adjusted to B11 A12_B0 removed 0.5473130
B32 scalar product A0_B32 added 0.5278274
B11 rational exponents updated version of A12 A0_B11 added 0.4880861
A13 ellipsoid A13_B0 removed 0.4075948
A1 B1 linear function A1_B1 unchanged 0.2991127
A25 velocity A25_B0 removed 0.2378399
A2 3D A2_B0 removed 0.2369482

3.5.2 Evaluating the changes

info_comparison_data <- item_info_data %>% 
  mutate(item = as.character(item)) %>% 
  left_join(test_versions %>% mutate(item = as.character(label)), by = "item") %>% 
  group_by(theta) %>% 
  summarise(
    items_pre = sum(ifelse(outcome %in% c("unchanged", "removed"), info_y, 0)),
    items_post = sum(ifelse(outcome %in% c("unchanged", "added"), info_y, 0)),
    items_added = sum(ifelse(outcome %in% c("added"), info_y, 0)),
    items_removed = sum(ifelse(outcome %in% c("removed"), info_y, 0)),
    n_added = sum(ifelse(outcome %in% c("added"), 1, 0)),
    items_added_mean = sum(ifelse(outcome %in% c("added"), info_y, 0)) / n_added,
    items_removed_mean = sum(ifelse(outcome %in% c("removed"), info_y, 0)) / sum(ifelse(outcome %in% c("removed"), 1, 0))
  ) %>% 
  pivot_longer(cols = starts_with("items_"), names_to = "items", names_prefix = "items_", values_to = "info_y")
test_info_total_plot <- info_comparison_data %>% 
  filter(items %in% c("pre", "post")) %>% 
  mutate(items = case_when(items == "pre" ~ "Version A", items == "post" ~ "Version B")) %>% 
  #mutate(items = fct_relevel(items, "pre", "post")) %>% 
  ggplot(aes(x = theta, y = info_y, colour = items, linetype = items)) +
  geom_line() +
  scale_colour_brewer("ETH s21t", palette = "Set1") +
  scale_linetype_manual("ETH s21t", values = c("dashed", "solid")) +
  labs(x = "Ability", y = "Information") +
  theme_minimal() +
  theme(legend.position = "top",
        legend.title = element_text(face = "bold"))

#test_info_total_plot
ggsave("output/eth_pre-vs-post_info.pdf", units = "cm", width = 8, height = 8)
test_info_changes_plot <- info_comparison_data %>% 
  filter(!items %in% c("pre", "post")) %>% 
  mutate(panel = ifelse(str_detect(items, "_mean"), "Mean information\nper question", "Total information")) %>% 
  mutate(panel = fct_rev(panel)) %>% 
  mutate(items = str_remove(items, "_mean")) %>% 
  #mutate(items = fct_relevel(items, "pre", "post")) %>% 
  ggplot(aes(x = theta, y = info_y, colour = items, linetype = items)) +
  geom_line() +
  scale_colour_brewer("Questions", palette = "Paired", direction = -1) +
  scale_linetype_manual("Questions", values = c("solid", "dashed")) +
  facet_wrap(vars(panel), scales = "free_y") +
  labs(x = "Ability", y = "Information") +
  theme_minimal() +
  theme(legend.position = "top")

#test_info_changes_plot
ggsave("output/eth_pre-vs-post_changes.pdf", units = "cm", width = 16, height = 8)
test_info_total_plot + test_info_changes_plot + patchwork::plot_layout(widths = c(2, 3))

ggsave("output/eth_pre-vs-post_info-summary.pdf", units = "cm", width = 16, height = 8)

This shows that the questions that added questions offer, on average, slightly more information than the questions they replaced (however there are fewer of them, hence lower total information as shown above). The difference is largest in the ability range 0-2.

3.6 Response curves

trace_data <- probtrace(fit_2pl, theta) %>% 
  as_tibble() %>% 
  bind_cols(theta = theta) %>% 
  pivot_longer(cols = -theta, names_to = "level", values_to = "y") %>% 
  separate(level, into = c("item", NA, "score"), sep = "\\.")

trace_data %>% 
  mutate(item = fct_reorder(item, parse_number(item))) %>% 
  ggplot(aes(x = theta, y = y, colour = score)) +
  geom_line() +
  scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
  facet_wrap(vars(item)) +
  labs(y = "Probability of response") +
  theme_minimal()

This plot combines all the trace lines onto a single set of axes:

plt <- trace_data %>% 
  filter(score == 1) %>% 
  left_join(test_versions %>% mutate(item = as.factor(label)), by = "item") %>% 
  mutate(item_removed = (outcome == "removed")) %>% 
  ggplot(aes(x = theta, y = y, colour = item, text = item)) +
  geom_line(aes(linetype = outcome)) +
  scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
  scale_linetype_manual("outcome", values = c("unchanged" = "solid", "removed" = "dashed", "added" = "twodash")) +
  labs(y = "Expected score") +
  theme_minimal()

ggplotly(plt, tooltip = "text")
ggsave(plot = plt, file = "output/eth_pre-vs-post_iccs-superimposed.pdf", width = 20, height = 14, units = "cm")

That is quite a busy plot, so here we focus only on the changes:

plt <- trace_data %>% 
  filter(score == 1) %>% 
  left_join(test_versions %>% mutate(item = as.factor(label)), by = "item") %>% 
  filter(outcome != "unchanged") %>% 
  mutate(item_removed = (outcome == "removed")) %>% 
  ggplot(aes(x = theta, y = y, colour = item, text = item)) +
  geom_line(aes(linetype = outcome)) +
  scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
  scale_linetype_manual("outcome", values = c("removed" = "dashed", "added" = "solid")) +
  labs(y = "Expected score") +
  theme_minimal()

ggplotly(plt, tooltip = "text")
ggsave(plot = plt, file = "output/eth_pre-vs-post_iccs-superimposed-highlight.pdf", width = 20, height = 14, units = "cm")

4 Factor analysis for the new test only

4.1 Factor analysis setup

Here we redo the factor analysis, but using only the data from the new version of the test.

item_scores_B <- test_scores %>% 
  select(-F1) %>% 
  select(-contains("B0")) %>% 
  filter(test_version == "post") %>% 
  select(-class, -year, -test_version)

The parameters package provides functions that run various checks to see if the data is suitable for factor analysis, and if so, how many factors should be retained.

structure <- check_factorstructure(item_scores_B)
n <- n_factors(item_scores_B)

4.1.0.1 Is the data suitable for Factor Analysis?

  • KMO: The Kaiser, Meyer, Olkin (KMO) measure of sampling adequacy suggests that data seems appropriate for factor analysis (KMO = 0.97).
  • Sphericity: Bartlett’s test of sphericity suggests that there is sufficient significant correlation in the data for factor analysis (Chisq(496) = 44502.66, p < .001).

4.1.0.2 Method Agreement Procedure:

The choice of 4 dimensions is supported by 5 (26.32%) methods out of 19 (beta, Optimal coordinates, Parallel analysis, Kaiser criterion, Scree (SE)).

plot(n)

summary(n) %>% gt()
n_Factors n_Methods
1 4
2 1
3 2
4 5
5 1
6 1
12 2
29 3
#n %>% tibble() %>% gt()

The scree plot shows the eignvalues associated with each factor:

fa.parallel(item_scores_B, fa = "fa")

## Parallel analysis suggests that the number of factors =  7  and the number of components =  NA

Based on this, there is clear support for a 1-factor solution. We also consider the 4-factor solution.

4.2 1 Factor

Show factanal output
fitfact <- factanal(item_scores_B, factors = 1, rotation = "varimax")
print(fitfact, digits = 2, cutoff = 0.3, sort = TRUE)
## 
## Call:
## factanal(x = item_scores_B, factors = 1, rotation = "varimax")
## 
## Uniquenesses:
##   A1_B1   A3_B2   A4_B3   A5_B4   A6_B5   A7_B6   A8_B7   A9_B8  A10_B9 A11_B10 
##    0.95    0.78    0.80    0.73    0.77    0.76    0.78    0.62    0.66    0.70 
## A14_B12 A15_B13 A16_B14 A17_B15 A18_B16 A20_B17 A21_B18 A22_B19 A23_B20 A24_B21 
##    0.71    0.88    0.83    0.78    0.86    0.66    0.60    0.61    0.62    0.73 
## A26_B22 A27_B23 A28_B24 A29_B25 A31_B27 A32_B28 A33_B29  A0_B11  A0_B26  A0_B30 
##    0.70    0.65    0.64    0.88    0.76    0.80    0.86    0.89    0.81    0.88 
##  A0_B31  A0_B32 
##    0.84    0.83 
## 
## Loadings:
##  [1] 0.52 0.62 0.58 0.55 0.54 0.58 0.63 0.62 0.62 0.52 0.54 0.59 0.60      0.47
## [16] 0.45 0.48 0.49 0.47 0.35 0.41 0.47 0.38 0.34 0.49 0.45 0.37 0.33 0.44 0.34
## [31] 0.41 0.41
## 
##                Factor1
## SS loadings       7.63
## Proportion Var    0.24
## 
## Test of the hypothesis that 1 factor is sufficient.
## The chi square statistic is 3991.49 on 464 degrees of freedom.
## The p-value is 0
load <- tidy(fitfact)
ggplot(load, aes(x = fl1, y = 0)) + 
  geom_point() + 
  geom_label_repel(aes(label = paste0("A", rownames(load))), show.legend = FALSE) +
  labs(x = "Factor 1", y = NULL,
       title = "Standardised Loadings", 
       subtitle = "Based upon correlation matrix") +
  theme_minimal()

load %>% 
  select(question = variable, factor_loading = fl1) %>% 
  left_join(test_versions %>% select(question = label, description), by = "question") %>% 
  arrange(-factor_loading) %>% 
  gt() %>%
  data_color(
    columns = contains("factor"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  )
question factor_loading description
A21_B18 0.6335602 (ln(sin))'
A22_B19 0.6229067 hyp.functions
A9_B8 0.6173382 trig.ineq.
A23_B20 0.6167307 slope tangent
A28_B24 0.5989062 Int = 0
A27_B23 0.5934801 int(exp)
A20_B17 0.5839634 (exp)'
A10_B9 0.5823615 trig.identity
A11_B10 0.5489394 period
A26_B22 0.5440885 int(poly)
A14_B12 0.5367772 limit
A5_B4 0.5215381 logs
A24_B21 0.5153658 IVT
A31_B27 0.4918262 int(abs)
A7_B6 0.4910048 graph translation
A6_B5 0.4757299 logs
A3_B2 0.4702819 arithmetic rules
A8_B7 0.4697829 sine pi/3
A17_B15 0.4670353 graph f'
A32_B28 0.4470902 FtoC algebra
A4_B3 0.4454821 Easy ineq.
A0_B26 0.4373808 FtoC graph
A16_B14 0.4142581 diff.quotient
A0_B32 0.4094848 scalar product
A0_B31 0.4058509 vector product
A18_B16 0.3802267 product rule
A33_B29 0.3700329 difference vectors
A15_B13 0.3531196 series
A0_B30 0.3422352 normal vector
A29_B25 0.3420163 int even funct
A0_B11 0.3327412 rational exponents
A1_B1 0.2246888 linear function

The questions that load most strongly on this factor are very standard calculations in integration, differentiation, and trigonometry – which is consistent with the aim of the test.

4.3 4 Factor

Here we also investigate the 4-factor solution, to see whether these factors are interpretable.

Show factanal output
fitfact4 <- factanal(item_scores_B, factors = 4, rotation = "varimax")
print(fitfact4, digits = 2, cutoff = 0.3, sort = TRUE)
## 
## Call:
## factanal(x = item_scores_B, factors = 4, rotation = "varimax")
## 
## Uniquenesses:
##   A1_B1   A3_B2   A4_B3   A5_B4   A6_B5   A7_B6   A8_B7   A9_B8  A10_B9 A11_B10 
##    0.90    0.73    0.73    0.71    0.75    0.72    0.76    0.58    0.64    0.69 
## A14_B12 A15_B13 A16_B14 A17_B15 A18_B16 A20_B17 A21_B18 A22_B19 A23_B20 A24_B21 
##    0.69    0.85    0.75    0.72    0.84    0.54    0.53    0.58    0.62    0.63 
## A26_B22 A27_B23 A28_B24 A29_B25 A31_B27 A32_B28 A33_B29  A0_B11  A0_B26  A0_B30 
##    0.55    0.61    0.63    0.83    0.72    0.76    0.81    0.88    0.72    0.85 
##  A0_B31  A0_B32 
##    0.79    0.75 
## 
## Loadings:
##         Factor1 Factor2 Factor3 Factor4
## A20_B17                  0.56          
## A26_B22          0.31    0.57          
## A1_B1                                  
## A3_B2    0.35    0.34                  
## A4_B3    0.31    0.40                  
## A5_B4    0.41                          
## A6_B5            0.32                  
## A7_B6            0.42                  
## A8_B7    0.37                          
## A9_B8    0.42    0.36            0.32  
## A10_B9   0.46                          
## A11_B10  0.34    0.36                  
## A14_B12  0.40                          
## A15_B13                                
## A16_B14  0.44                          
## A17_B15          0.43                  
## A18_B16                                
## A21_B18  0.44            0.47          
## A22_B19  0.40            0.41          
## A23_B20  0.39                          
## A24_B21          0.47            0.33  
## A27_B23  0.37            0.41          
## A28_B24  0.32                    0.34  
## A29_B25                  0.36          
## A31_B27                          0.33  
## A32_B28  0.33                          
## A33_B29          0.38                  
## A0_B11                                 
## A0_B26           0.39            0.33  
## A0_B30                                 
## A0_B31                           0.33  
## A0_B32                           0.41  
## 
##                Factor1 Factor2 Factor3 Factor4
## SS loadings       2.82    2.54    2.20    1.57
## Proportion Var    0.09    0.08    0.07    0.05
## Cumulative Var    0.09    0.17    0.24    0.29
## 
## Test of the hypothesis that 4 factors are sufficient.
## The chi square statistic is 1203.85 on 374 degrees of freedom.
## The p-value is 7.2e-88
load4 <- tidy(fitfact4)
ggplot(load4, aes(x = fl1, y = fl2)) + 
  geom_point() + 
  geom_label_repel(aes(label = paste0("A", rownames(load))), show.legend = FALSE) +
  labs(x = "Factor 1", y = "Factor 2",
       title = "Standardised Loadings", 
       subtitle = "Based upon correlation matrix") +
  theme_minimal()

main_factors <- load4 %>% 
#  mutate(factorNone = 0.4) %>%  # add this to set the main factor to "None" where all loadings are below 0.4
  pivot_longer(names_to = "factor",
               cols = contains("fl")) %>% 
  mutate(value_abs = abs(value)) %>% 
  group_by(variable) %>% 
  top_n(1, value_abs) %>% 
  ungroup() %>% 
  transmute(main_factor = factor, variable)

load4 %>% 
  select(-uniqueness) %>% 
  # add the info about which is the main factor
  left_join(main_factors, by = "variable") %>%
  left_join(test_versions %>% select(variable = label, description), by = "variable") %>% 
  arrange(main_factor) %>% 
  select(main_factor, everything()) %>% 
  # arrange adjectives by descending loading on main factor
  rowwise() %>% 
  mutate(max_loading = max(abs(c_across(starts_with("fl"))))) %>% 
  group_by(main_factor) %>% 
  arrange(-max_loading, .by_group = TRUE) %>% 
  select(-max_loading) %>% 
  # sort out the presentation
  rename("Main Factor" = main_factor, # the _ throws a latex error
         "Question" = variable) %>%
  mutate_at(
    vars(starts_with("fl")),
    ~ cell_spec(round(., digits = 3), bold = if_else(abs(.) > 0.4, T, F))
  ) %>% 
  kable(booktabs = T, escape = F, longtable = T) %>% 
  kableExtra::collapse_rows(columns = 1, valign = "top") %>%
  kableExtra::kable_styling(latex_options = c("repeat_header"))
Main Factor Question fl1 fl2 fl3 fl4 description
fl1 A10_B9 0.456 0.276 0.229 0.157 trig.identity
fl1 A16_B14 0.44 0.03 0.131 0.206 diff.quotient
fl1 A9_B8 0.416 0.361 0.128 0.324 trig.ineq.
fl1 A5_B4 0.412 0.264 0.195 0.126 logs
fl1 A14_B12 0.403 0.225 0.277 0.13 limit
fl1 A23_B20 0.393 0.268 0.287 0.274 slope tangent
fl1 A8_B7 0.374 0.224 0.16 0.145 sine pi/3
fl1 A3_B2 0.346 0.34 0.175 0.022 arithmetic rules
fl1 A32_B28 0.326 0.082 0.208 0.288 FtoC algebra
fl1 A18_B16 0.269 0.098 0.193 0.2 product rule
fl2 A24_B21 0.137 0.47 0.136 0.334 IVT
fl2 A17_B15 0.152 0.43 0.139 0.23 graph f’
fl2 A7_B6 0.213 0.422 0.2 0.13 graph translation
fl2 A4_B3 0.309 0.4 0.057 0.085 Easy ineq.
fl2 A0_B26 0.115 0.388 0.086 0.333 FtoC graph
fl2 A33_B29 0.085 0.383 0.138 0.143 difference vectors
fl2 A11_B10 0.339 0.362 0.194 0.175 period
fl2 A6_B5 0.267 0.318 0.264 0.067 logs
fl2 A1_B1 0.025 0.28 0.143 -0.011 linear function
fl2 A15_B13 0.202 0.256 0.049 0.201 series
fl2 A0_B11 0.175 0.238 0.082 0.17 rational exponents
fl3 A26_B22 0.122 0.308 0.568 0.115 int(poly)
fl3 A20_B17 0.276 0.261 0.556 0.063 (exp)’
fl3 A21_B18 0.438 0.185 0.473 0.145 (ln(sin))’
fl3 A22_B19 0.402 0.197 0.414 0.22 hyp.functions
fl3 A27_B23 0.368 0.162 0.407 0.251 int(exp)
fl3 A29_B25 0.143 0.052 0.356 0.147 int even funct
fl3 A0_B30 0.093 0.135 0.26 0.232 normal vector
fl4 A0_B32 0.229 0.108 0.125 0.41 scalar product
fl4 A28_B24 0.322 0.287 0.26 0.343 Int = 0
fl4 A0_B31 0.112 0.178 0.246 0.328 vector product
fl4 A31_B27 0.293 0.283 0.097 0.326 int(abs)

The first factor is again calculations, but this time only in calculus (i.e. integrals and derivatives).

The second factor seems to be something like “abstract stuff”, it has to do with limits, rules for logarithms etc. I guess that could be a category of its own.

The third one is interesting. It’s clearly graphical interpretations. All in different settings, but clearly belonging together.

And of the fourth factor does not have a clear interpretation.

About this report

This report supports the analysis in the following paper:

[citation needed]

Packages

In this analysis we used the following packages. You can learn more about each one by clicking on the links below.

  • mirt: For IRT analysis
  • psych: For factor analysis
  • tidyverse: For data wrangling and visualisation
  • reshape: For reshaping nested lists
  • vroom: For reading in many files at once
  • broom: For tidying model output
  • fs: For file system operations
  • gt: For formatting tables
  • knitr: For markdown tables
  • ggrepel: For labelling points without overlap
  • skimr: For data frame level summary
  • ggridges: For ridge plots